This document accompanies my PhD thesis titled “Enhancing analysis and interpretation workflows for transcriptome data with an interactive R Bioconductor toolkit”. The document will follow all the analysis and steps documented in my PhD thesis in Section 3.3.

1 About the data

The data illustrated in this document is documented in detail in Section 3.3.1 of my PhD thesis.

The datasets shown is an RNA-seq dataset, available at the Gene Expression Omnibus under the accession code GSE130842.

The data represents a mouse data set of 56 different samples isolated from various murine tissues. In this analysis, I will focus in the Klrg1-negative Nfil3(GFP)-negative Tregs and Klrg1-positive Nfil3(GFP)-positive Tregs samples as discussed with the original authors of the manuscript (Delacher et al. 2020). To analyse the data, the count data was downloaded from the Gene Expression Omnibus (https://www.ncbi.xyz/geo/query/acc.cgi?acc=GSE130842). The count data can be found in file “GSE130842_Count_table_Delacher_et_al_2019.xlsx”. Additionally, a metadata file was set up, containing the sample names, conditions and replicate numbers following the information of the GEO entry. This is the file named “GSE130842_Count_table_Delacher_et_al_2019.csv”

The workflow shown in this document will follow the steps described in our manuscript “Interactive and Reproducible Workflows for Exploring and Modeling RNA-seq Data with pcaExplorer, Ideal, and GeneTonic” (Ludt et al. 2022).

2 Loading required packages

First, load the packages required to perform all the analytic steps presented in this document.

library("DESeq2")
library("topGO")
library("pcaExplorer")
library("ideal")
library("GeneTonic")
library("apeglm")
library("dplyr")
library("readxl")
library("GeDi")
library("org.Mm.eg.db")
library("utils")
library("ComplexHeatmap")
library("readxl")
library("ggplot2")
library("dplyr")
library("forcats")
library("visNetwork")

3 Data processing

First, the data will be preprocessed and saved as an DESeqDataSet object, as described in Section 3.3.1 of my thesis.

# Read the gene expression count data from an Excel file
# The Excel file contains the raw count table for RNA-seq data
data <- readxl::read_excel("GSE130842_Count_table_Delacher_et_al_2019.xlsx")

# Extract the gene names from the first column of the dataset
gene_names <- data$ID  # 'ID' column contains gene names

# Convert the remaining columns (gene counts) to a matrix
# Skip the first column which contains gene names
data <- as.matrix(data[, -1])

# Assign gene names as row names to the count matrix
rownames(data) <- gene_names

# Subset the data for specific columns (samples)
# Columns 33:36 and 41:44 represent selected samples for analysis
data <- data[, c(33:36, 41:44)]

# Read the metadata file that contains information about the samples
# Metadata contains columns describing conditions and replicates
metadata <- utils::read.csv("GSE130842_Metadata.csv") 

# Subset relevant columns from the metadata (conditions and replicate number)
metadata <- metadata[, c(2:3)]

# Create a DESeq2 dataset object from the count matrix and metadata
# This dataset will be used for differential expression analysis
# The design formula (~condition) specifies that the analysis should model the 'condition' variable
dds <- DESeqDataSetFromMatrix(countData = data, 
                              colData = metadata,
                              design = ~condition)

# Estimate the size factors for normalization of the data
# Size factors are used to normalize the sequencing depth across samples
dds <- estimateSizeFactors(dds)

4 Exploratory Data Analysis using pcaExplorer

# Retrieve gene annotation using the OrgDb database for *Mus musculus* (mouse)
# The annotation is matched using the "ENSEMBL" identifier.
anno_df <- pcaExplorer::get_annotation_orgdb(dds, "org.Mm.eg.db", "ENSEMBL")
# Launch the PCAExplorer application to interactively explore principal component analysis (PCA) results.
# The function `pcaExplorer()` is called with the following inputs:
# - dds: A DESeq2 dataset object containing count data and experimental metadata
# - annotation: The annotation data frame (anno_df) containing gene annotations for better interpretation
# This opens a Shiny app that allows for interactive exploration of PCA and downstream analyses.
pcaExplorer(dds = dds, annotation = anno_df)

5 Differential Gene Expression Analysis using ideal and GeneTonic

# Launches the iDEAL Shiny app for interactive DE analysis
# - dds_obj: DESeq2 dataset object used for differential expression analysis
# - annotation_obj: Annotation object containing gene annotations (from `get_annotation_orgdb`)
# The `ideal()` function opens the interactive DE analysis tool that allows for visualization, interpretation, and exploration of DE results.
ideal(dds_obj = dds, annotation_obj = anno_df)
# Reading in the differential expression (DE) gene results from a pre-generated CSV file
# This file contains the DE genes and their associated statistics
de_genes <- read.csv("table_DE_results_ideal.csv")

# Converting the data into a DESeqResults object, which stores DE analysis results
# - res_de: Contains the DE genes and their statistics for downstream analysis
res_de <- DESeqResults(de_genes)
rownames(res_de) <- res_de$X  # Assign rownames for easy indexing later
colnames(res_de)[colnames(res_de) == "symbol"] <- "SYMBOL"

# Reading the functional enrichment results from a CSV file (TopGO results)
# Limiting the results to the top 500 enriched terms for further processing
topGOResults <- read.csv("table_Functional_enrichment_Results.csv")
topGOResults <- topGOResults[c(1:500),]

# This function processes the TopGO results to ensure the format is compatible
# with downstream analysis functions in `GeneTonic`.
# `shake_topGOtableResult` modifies or reorders the table as needed.
topGOResults <- shake_topGOtableResult(topGOResults)

# Aggregating scores for the gene sets and the DE genes, linking DE results with functional enrichment.
# The `get_aggrscores` function calculates aggregated enrichment scores by combining the topGO results
# and DE results (res_de) and integrating gene annotations from the annotation object (anno_df).
res_enrich <- get_aggrscores(topGOResults,
                             res_de,
                             anno_df)

# Creating a GeneTonicList (gtl) object, which consolidates all required input data for `GeneTonic`:
# - dds: DESeq2 dataset object
# - res_de: Differential expression results object
# - res_enrich: Functional enrichment scores calculated from `get_aggrscores`
# - anno_df: Gene annotation information
# The gtl object is necessary for launching the `GeneTonic()` app.
gtl <- GeneTonicList(dds, 
                     res_de, 
                     res_enrich, 
                     anno_df)
# Launch the `GeneTonic()` Shiny app, which provides an interactive platform for 
# exploring and visualizing the results of differential expression and functional enrichment analysis.
GeneTonic(gtl = gtl)

6 Functional Enrichment Result Interpretation using GeDi

GeDi::GeDi(gtl = gtl)

7 Figure Generation

This part will document the code for some of the figures included in my thesis.

# Extract the gene sets from the 'gtl' object, specifically from the enrichment results
genesets <- gtl$res_enrich

# Rename the columns in the 'genesets' data frame so that it 
# can be used directly in `GeDi`
names(genesets)[names(genesets) == "gs_id"] <- "Genesets"
names(genesets)[names(genesets) == "gs_genes"] <- "Genes"
names(genesets)[names(genesets) == "gs_description"] <- "Term"

# Prepare the gene sets for further analysis by extracting and formatting the gene information
genes <- prepareGenesetData(genesets)

# Apply a filter to exclude gene sets that have 200 or more genes
filter <- sapply(genes, function(x) length(x) >= 200)

# Remove the filtered out, large gene sets from the 'genesets' data frame
genesets <- genesets[!filter, ]

# Prepare the filtered gene sets for further analysis by reapplying the data preparation function
genes <- prepareGenesetData(genesets)

7.1 Dendrogram Example of Figure 8 and Network Example of Figure 9

# Create a copy of the 'genesets' data frame to use as example data for further analysis
example_data <- genesets

# Filter the 'example_data' to include only specific gene sets, identified by their GO terms
example_data <-
  example_data[example_data$Genesets %in% c("GO:0051988", "GO:0051987", "GO:0051315", "GO:0007094", "GO:1901970", "GO:1905820", "GO:0051984", "GO:0090267", "GO:0051255",
 "GO:1902412", "GO:0051231"),]

# Prepare the filtered gene sets for further analysis by extracting the gene information
example_genes <- prepareGenesetData(example_data)

# Calculate the Jaccard distance scores for the selected gene sets
jaccard_scores <- getJaccardMatrix(example_genes)

# Set the row and column names of the distance matrix to be the names of the gene sets
rownames(jaccard_scores) <- colnames(jaccard_scores) <- example_data$Genesets

# Generate a dendrogram 
d <- distanceDendro(jaccard_scores)

# Increase the text size of the abscissa labels for better readability and adjust the line width
d <- d + 
  theme(axis.text.x = element_text(size = 10)) +  
  geom_path(linewidth = 5)  

# Display the customized dendrogram plot
d

# Build a graph object from the adjacency matrix derived from the Jaccard scores,
# using a threshold of 0.67 to determine similarity between gene sets
g <- buildGraph(getAdjacencyMatrix(jaccard_scores, 0.67))

# Use 'visNetwork' to visualize the graph,
# set node color to blue and node border and edge color to black
visNetwork::visIgraph(g) %>%
  visNodes(color = list(
            background = "#0092AC",
            border = "#545454" ), 
           font = list(size = 20)) %>% 
  visEdges(width = 3) 

7.2 Graph Network Examples of Figure 10

# Create a copy of the 'genesets' data to be used for further analysis and prepare data
example_data <- genesets
example_genes <- prepareGenesetData(example_data)

# Calculate the Jaccard distance matrix for the gene sets
jaccard_scores <- getJaccardMatrix(example_genes)

# Set the row and column names of the Jaccard distance matrix to be the names of the gene sets
rownames(jaccard_scores) <- colnames(jaccard_scores) <- example_data$Genesets

# Perform Louvain clustering on the Jaccard distance matrix with a threshold of 0.4
louvain <- louvainClustering(jaccard_scores, 0.4)

# Perform Fuzzy clustering on the gene sets 
# Seed clusters are identified with thresholds 0.65 and 0.3, and the final clustering is done with a threshold of 0.5
fuzzy <- fuzzyClustering(seedFinding(jaccard_scores, 0.65, 0.3), 0.5)

# Build a graph object for the Louvain and Fuzzy clustering results
g_Louvain <- buildClusterGraph(louvain, 
                               example_data,
                               example_data$Genesets,
                               color_by = "Cluster")

g_Fuzzy <- buildClusterGraph(fuzzy, 
                             example_data,
                             example_data$Genesets,
                             color_by = "Cluster")

# Visualize the Louvain and Fuzzy clustering graph 
visNetwork::visIgraph(g_Louvain) %>% 
  visNodes(color = list(border = "#545454")) %>% 
  visEdges(color = list(border = "#545454"), width = 10)
visNetwork::visIgraph(g_Fuzzy) %>% 
  visNodes(color = list(border = "#545454")) %>% 
  visEdges(color = list(border = "#545454"), width = 10) 

7.3 Literature Review of Figures 16 to 18

7.3.1 Read in the data

paper_20 <- read_xlsx("LiteratureReviewResults.xlsx", 
                      sheet = "First search, 20 paper")

paper_33 <- read_xlsx("LiteratureReviewResults.xlsx", 
                      sheet = "Second search, 33 paper")

paper_44 <- read_xlsx("LiteratureReviewResults.xlsx", 
                      sheet = "Third search, 44 paper")

7.3.2 Plots on the details on the enrichment analysis

enrich20 <- data.frame(table(paper_20$`Degree of Detail on the Enrichment Method`)) 
colnames(enrich20) <- c("Detail", "Frequency")
ggplot(enrich20,aes(x = reorder(Detail,Frequency), y = Frequency, fill = Detail)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#93AA00", "#F8766D", "#D39200", "darkgrey"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Degree of Detail on the enrichment methods") +
  theme_bw(base_size = 18) +
  coord_flip() + 
  geom_text(
    aes(x = Detail, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 13))

enrich33 <- data.frame(table(paper_33$`Degree of Detail on the Enrichment Method`)) 
colnames(enrich33) <- c("Detail", "Frequency")
ggplot(enrich33,aes(x = reorder(Detail,Frequency), y = Frequency, fill = Detail)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#93AA00", "#F8766D", "#D39200", "darkgrey"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Degree of Detail on the enrichment methods") +
  theme_bw(base_size = 18) +
  coord_flip() + 
    geom_text(
    aes(x = Detail, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 16))

enrich44 <- data.frame(table(paper_44$`Degree of Detail on the Enrichment Method`)) 
colnames(enrich44) <- c("Detail", "Frequency")
ggplot(enrich44,aes(x = reorder(Detail,Frequency), y = Frequency, fill = Detail)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c( "#93AA00","#D39200", "darkgrey"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Degree of Detail on the enrichment methods") +
  theme_bw(base_size = 18) +
  coord_flip() + 
    geom_text(
    aes(x = Detail, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))

7.3.3 Gene Set Libraries Used

gs_Libraries20 <- paper_20$`Gene Set Library used`
gs_Libraries20 <- unlist(strsplit(gs_Libraries20, ",\\s*"))
library_20 <- data.frame(table(gs_Libraries20)) 
colnames(library_20) <- c("Library", "Frequency")
library_20$Library <- as.factor(library_20$Library)

ggplot(library_20, aes(x = reorder(Library,Frequency), y = Frequency, fill = Library)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#00C19F", "#F8766D", "#00B9E3",
                               "#D39200","darkgrey", "#93AA00"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Gene Set Library used") +
  theme_bw(base_size = 18) + 
  coord_flip() + 
    geom_text(
    aes(x = Library, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 15))

gs_Libraries33 <- paper_33$`Gene Set Library used`
gs_Libraries33 <- unlist(strsplit(gs_Libraries33, ",\\s*"))
library_33 <- data.frame(table(gs_Libraries33)) 
colnames(library_33) <- c("Library", "Frequency")
library_33$Library <- as.factor(library_33$Library)

ggplot(library_33, aes(x = reorder(Library,Frequency), y = Frequency, fill = Library)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#93AA00", "#00C19F", "#00B9E3",
                               "#F8766D", "darkgrey","#D39200"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Gene Set Library used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
     geom_text(
    aes(x = Library, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 15))

gs_Libraries44 <- paper_44$`Gene Set Library used`
gs_Libraries44 <- unlist(strsplit(gs_Libraries44, ",\\s*"))
library_44 <- data.frame(table(gs_Libraries44)) 
colnames(library_44) <- c("Library", "Frequency")
library_44$Library <- as.factor(library_44$Library)

ggplot(library_44, aes(x = reorder(Library,Frequency), y = Frequency, fill = Library)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "#D39200", "#FF65AC", "#93AA00", "#00B9E3",  
                               "#00BA38", "darkgrey","#00C19F" ),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Gene Set Library used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
     geom_text(
    aes(x = Library, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))

7.3.4 Software used

software20 <- paper_20$`Software used`
software20 <- unlist(strsplit(software20, ",\\s*"))
software20 <- data.frame(table(software20)) 
colnames(software20) <- c("Software", "Frequency")

ggplot(software20,aes(x = reorder(Software,Frequency), y = Frequency, fill = Software)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#FF65AC", "#F8766D", "#E18A00", "#BE9C00", "#8CAB00",
                               "#00ACFC", "#D575FE", "#24B700", "darkgrey","lightgrey",
                               "#00BE70", "#00C1AB", "#00BBDA"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Software used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
     geom_text(
    aes(x = Software, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  )

software33 <- paper_33$`Software used`
software33 <- unlist(strsplit(software33, ",\\s*"))
software33 <- data.frame(table(software33)) 
colnames(software33) <- c("Software", "Frequency")

ggplot(software33,aes(x = reorder(Software,Frequency), y = Frequency, fill = Software)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#00BE70", "#D575FE", "#00C1AB", "#00BBDA","#F8766D",
                               "#E18A00", "#00ACFC", "darkgrey",  "lightgrey","#BE9C00",
                               "#8CAB00", "#24B700"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Software used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
       geom_text(
    aes(x = Software, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 15))

software44 <- paper_44$`Software used`
software44 <- unlist(strsplit(software44, ",\\s*"))
software44 <- data.frame(table(software44)) 
colnames(software44) <- c("Software", "Frequency")

ggplot(software44,aes(x = reorder(Software,Frequency), y = Frequency, fill = Software)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#FF65AC", "#F8766D", "#00C1AB", "#E18A00", "#BE9C00",
                               "#D575FE", "#00BBDA", "#00ACFC", "darkgrey", "lightgrey", 
                               "#8CAB00", "#24B700","#00BE70"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Software used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
       geom_text(
    aes(x = Software, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))

7.3.5 Function used

function20 <- data.frame(table(paper_20$`Statistical Test or Function`))
colnames(function20) <- c("Test", "Frequency")

ggplot(function20,aes(x = reorder(Test,Frequency), y = Frequency, fill = Test)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "darkgrey", "lightgrey"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Test/Function used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
       geom_text(
    aes(x = Test, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) + 
  ylim(c(0, 16))

function33 <- data.frame(table(paper_33$`Statistical Test or Function`))
colnames(function33) <- c("Test", "Frequency")

ggplot(function33,aes(x = reorder(Test,Frequency), y = Frequency, fill = Test)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "#E18A00", "darkgrey", "lightgrey"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Test/Function used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
         geom_text(
    aes(x = Test, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 19))

function44 <- paper_44$`Statistical Test or Function`
function44 <- unlist(strsplit(function44, ",\\s*"))
function44 <- data.frame(table(function44)) 
colnames(function44) <- c("Test", "Frequency")

ggplot(function44,aes(x = reorder(Test,Frequency), y = Frequency, fill = Test)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "#8CAB00", "#E18A00", "#00ACFC",
                               "darkgrey", "lightgrey", "#BE9C00"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Test/Function used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
         geom_text(
    aes(x = Test, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))

7.3.6 Summary as a Table

table20 <- data.frame(table(paper_20$`Table of Results provided`))
colnames(table20) <- c("Table", "Frequency")
ggplot(table20,aes(x = reorder(Table,Frequency), y = Frequency, fill = Table)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c( "#D39200","darkgrey", "#93AA00"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Results presented as Table") +
  theme_bw(base_size = 18) +
  coord_flip() +
         geom_text(
    aes(x = Table, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  )

table33 <- data.frame(table(paper_33$`Table of Results provided`))
colnames(table33) <- c("Table", "Frequency")
ggplot(table33,aes(x = reorder(Table,Frequency), y = Frequency, fill = Table)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c( "#D39200","darkgrey", "#93AA00"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Results presented as Table") +
  theme_bw(base_size = 18) +
  coord_flip() +
           geom_text(
    aes(x = Table, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 15))

table44 <- data.frame(table(paper_44$`Table of Results provided`))
colnames(table44) <- c("Table", "Frequency")
ggplot(table44,aes(x = reorder(Table,Frequency), y = Frequency, fill = Table)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c( "#D39200","darkgrey", "#93AA00"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Results presented as Table") +
  theme_bw(base_size = 18) +
  coord_flip() +
           geom_text(
    aes(x = Table, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))

7.3.7 Summary presented visually

visual20 <- paper_20$Visualisation
visual20 <- unlist(strsplit(visual20, ",\\s*"))
visual20 <- data.frame(table(visual20)) 
colnames(visual20) <- c("Visualization", "Frequency")

ggplot(visual20,aes(x = reorder(Visualization,Frequency), y = Frequency, fill = Visualization)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "#E18A00", "#FF65AC", "#BE9C00", "#00BBDA",
                               "#8CAB00", "#00ACFC", "#24B700", "darkgrey",
                                "#00C1AB" ),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Type of Visualization") +
  theme_bw(base_size = 18) + 
  coord_flip() +
           geom_text(
    aes(x = Visualization, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 14))

visual33 <- paper_33$Visualisation
visual33 <- unlist(strsplit(visual33, ",\\s*"))
visual33 <- data.frame(table(visual33)) 
colnames(visual33) <- c("Visualization", "Frequency")

ggplot(visual33,aes(x = reorder(Visualization,Frequency), y = Frequency, fill = Visualization)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#00ACFC", "#F8766D", "#00BBDA", "#E18A00", "#BE9C00",
                               "#FF65AC", "#8CAB00", "#D575FE" , "#24B700", "#8B93FF",
                               "darkgrey", "#00BE70", "#00C1AB"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Type of Visualization") +
  theme_bw(base_size = 18) + 
  coord_flip() +
    geom_text(
    aes(x = Visualization, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 15))

visual44 <- paper_44$Visualisation
visual44 <- unlist(strsplit(visual44, ",\\s*"))
visual44 <- data.frame(table(visual44)) 
colnames(visual44) <- c("Visualization", "Frequency")

ggplot(visual44,aes(x = reorder(Visualization,Frequency), y = Frequency, fill = Visualization)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "#00ACFC", "#FF65AC", "#00BE70", 
                               "#D575FE", "#E18A00", "#BE9C00", "#00C1AB",
                               "#00BBDA", "darkgrey", "#8CAB00" ),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Type of Visualization") +
  theme_bw(base_size = 18) + 
  coord_flip() +
    geom_text(
    aes(x = Visualization, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))

7.3.8 Aggregation of Results

aggregation20 <- data.frame(table(paper_20$`Aggregation of Results`))
colnames(aggregation20) <- c("Aggregation", "Frequency")

ggplot(aggregation20,aes(x = reorder(Aggregation,Frequency),
                   y = Frequency, fill = Aggregation)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c( "#93AA00", "darkgrey", "#D39200"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Aggregation on the Results performed") +
  theme_bw(base_size = 18) +
  coord_flip() +
    geom_text(
    aes(x = Aggregation, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 15))

aggregation33 <- data.frame(table(paper_33$`Aggregation of Results`))
colnames(aggregation33) <- c("Aggregation", "Frequency")

ggplot(aggregation33,aes(x = reorder(Aggregation,Frequency),
                   y = Frequency, fill = Aggregation)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#93AA00", "darkgrey", "#D39200"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Aggregation on the Results performed") +
  theme_bw(base_size = 18) +
  coord_flip() +
  geom_text(
    aes(x = Aggregation, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 15))

aggregation44 <- data.frame(table(paper_44$`Aggregation of Results`))
colnames(aggregation44) <- c("Aggregation", "Frequency")

ggplot(aggregation44,aes(x = reorder(Aggregation,Frequency),
                   y = Frequency, fill = Aggregation)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#93AA00", "darkgrey", "#D39200"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Aggregation on the Results performed") +
  theme_bw(base_size = 18) +
  coord_flip() +
  geom_text(
    aes(x = Aggregation, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))

7.3.9 Selection of specific Gene Sets

selection20 <- data.frame(table(paper_20$`Selection of Gene Sets`))
colnames(selection20) <- c("Selection", "Frequency")

ggplot(selection20,aes(x = reorder(Selection,Frequency), y = Frequency, fill = Selection)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("darkgrey", "#F8766D"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Only selected Gene Sets presented") +
  theme_bw(base_size = 18) + 
  coord_flip() +
  geom_text(
    aes(x = Selection, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 17))

selection33 <- data.frame(table(paper_33$`Selection of Gene Sets`))
colnames(selection33) <- c("Selection", "Frequency")

ggplot(selection33,aes(x = reorder(Selection,Frequency), y = Frequency, fill = Selection)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("darkgrey", "#F8766D"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Only selected Gene Sets presented") +
  theme_bw(base_size = 18) + 
  coord_flip() +
    geom_text(
    aes(x = Selection, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 21))

selection44 <- data.frame(table(paper_44$`Selection of Gene Sets`))
colnames(selection44) <- c("Selection", "Frequency")

ggplot(selection44,aes(x = reorder(Selection,Frequency), y = Frequency, fill = Selection)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c( "darkgrey", "#F8766D"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Only selected Gene Sets presented") +
  theme_bw(base_size = 18) + 
  coord_flip() +
    geom_text(
    aes(x = Selection, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))

7.3.10 Reason for Selection

justification20 <- data.frame(table(paper_20$`Justification of Selection`))
colnames(justification20) <- c("Justification", "Frequency")

ggplot(justification20,aes(x = reorder(Justification,Frequency), y = Frequency, fill = Justification)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("Top" = "#FF65AC", "Not applicable" = "darkgrey",
                      "Not stated" = "lightgrey", "Rich Factor" =  "#00BBDA",
                      "p value" = "#8CAB00", "Number of DEG" =  "#BE9C00", 
                      "Literature" = "#E18A00", "Highest Enrichment" = "#F8766D"),
    
  
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Justification for Selection") +
  theme_bw(base_size = 18) + 
  coord_flip() +
    geom_text(
    aes(x = Justification, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  )

justification33 <- data.frame(table(paper_33$`Justification of Selection`))
colnames(justification33) <- c("Justification", "Frequency")

ggplot(justification33,aes(x = reorder(Justification,Frequency), y = Frequency, fill = Justification)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "darkgrey", "lightgrey",
                               "#E18A00", "#8CAB00"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Justification for Selection") +
  theme_bw(base_size = 18) + 
  coord_flip() +
    geom_text(
    aes(x = Justification, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0,15))

justification44 <- paper_44$`Justification of Selection`
justification44 <- unlist(strsplit(justification44, ",\\s*"))
justification44 <- data.frame(table(justification44))
colnames(justification44) <- c("Justification", "Frequency")

ggplot(justification44,aes(x = reorder(Justification,Frequency), y = Frequency, fill = Justification)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "#E18A00", "#FF65AC", "darkgrey",
                               "lightgrey", "#00ACFC", "#BE9C00", "#8CAB00"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Justification for Selection") +
  theme_bw(base_size = 18) + 
  coord_flip() +
   geom_text(
    aes(x = Justification, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) + 
  ylim(c(0, 25))

7.3.11 Summary Figure

columns_to_check <- c(
  "Degree of Detail on the Enrichment Method",
  "Gene Set Library used",
  "Statistical Test or Function",
  "Software used",
  "Table of Results provided",
  "Visualisation",
  "Aggregation of Results",
  "Selection of Gene Sets",
  "Justification of Selection"
)

checkPaperScore <- function(paper, columns){
  n_paper <- nrow(paper)
  summery_score <- c(rep(0, n_paper))
  
  for(i in 1:n_paper) {
  entry <- paper[i, ]
  score <- 0
  
  # Check each column for "Not stated" or "Not applicable"
  for (col in columns_to_check) {
    if(col != "Degree of Detail on the Enrichment Method"){
      if(!is.null(entry[[col]]) && !(entry[[col]] %in% c("Not stated", "Not applicable", "FALSE"))) {
        score <- score + 1
      }
    }else{
      if(!is.null(entry[[col]]) && entry[[col]] == "Detailed") {
        score <- score + 1
      }
    }
    }
  summery_score[i] <- score
  }
  return(summery_score)
}


summery_score_20 <- checkPaperScore(paper_20, columns_to_check)

summery_score_33 <- checkPaperScore(paper_33, columns_to_check)

#columns_to_check <- colnames(paper_44)[6:15]
#columns_to_check[1] <- "Details (at all) on enrichment analysis"

summery_score_44 <- checkPaperScore(paper_44, columns_to_check)

scores <- c(summery_score_20, summery_score_33, summery_score_44)
scores <- as.data.frame(scores[scores != 0])
colnames(scores)<- "Size"

p <- ggplot(scores, aes(x = .data$Size)) +
    geom_histogram(binwidth = 1, fill = "#0092AC") +
    theme_bw() +
  scale_x_continuous(limits = c(0, 9.5), breaks = 0:10) +
  xlab("Completeness Score") +
  ylab("Count")
  theme(panel.grid.major.x = element_blank(),
        axis.text = element_text(size = 5),
        axis.title.x = element_text(size = 7),
        axis.title.y = element_text(size = 7))
## List of 4
##  $ axis.title.x      :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : num 7
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.title.y      :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : num 7
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ axis.text         :List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : NULL
##   ..$ size         : num 5
##   ..$ hjust        : NULL
##   ..$ vjust        : NULL
##   ..$ angle        : NULL
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ panel.grid.major.x: list()
##   ..- attr(*, "class")= chr [1:2] "element_blank" "element"
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi FALSE
##  - attr(*, "validate")= logi TRUE
p

7.4 Comparison of various Alpha paramters of Figure 37

# Filter out gene sets that have fewer than 10 associated genes
filter <- sapply(genes, function(x) length(x) >= 10)
genesets_small <- genesets[!filter, ]  
genes_small <- prepareGenesetData(genesets_small) 

# Retrieve PPI information for the data 
string_db <- getStringDB(as.numeric(getId("Mus musculus")))
anno_df <- getAnnotation(string_db)
ppi <- getPPI(genes_small, string_db, anno_df)

# Calculate the Meet-Min (MM) distances
mm <- getMeetMinMatrix(genes_small)
# Set the row and column names of the MM matrix to be the names of the gene sets
rownames(mm) <- colnames(mm) <- genesets_small$Genesets

# Create a heatmap of the MM distance matrix without plot labels
d <- distanceHeatmap(mm, plot_labels = FALSE, title = "")

# Draw the heatmap and extract the row order for consistent comparison
ht <- draw(d)

row_order <- row_order(ht)

# Calculate pMM matrices for different alpha values (0, 0.5, and 1)
pMM0 <- getpMMMatrix(genes_small, ppi, alpha = 0)   
rownames(pMM0) <- colnames(pMM0) <- genesets_small$Genesets

pMM5 <- getpMMMatrix(genes_small, ppi, alpha = 0.5)
rownames(pMM5) <- colnames(pMM5) <- genesets_small$Genesets

pMM1 <- getpMMMatrix(genes_small, ppi, alpha = 1)   
rownames(pMM1) <- colnames(pMM1) <- genesets_small$Genesets

# Create and draw a heatmap for the original MM scores, keeping row order consistent
d <- distanceHeatmap(mm[row_order, row_order], 
                     plot_labels = F,
                     cluster_rows = F,
                     cluster_columns = F,
                     title = "")
draw(d)

# Create and draw a heatmap for pMM scores with alpha = 0
d <- distanceHeatmap(pMM0[row_order, row_order],
                     plot_labels = F,
                     cluster_rows = FALSE,
                     cluster_columns = FALSE,
                     title = "")
draw(d)

# Create and draw a heatmap for pMM scores with alpha = 0.5
d <- distanceHeatmap(pMM5[row_order, row_order],
                     plot_labels = F,
                     cluster_rows = FALSE,
                     cluster_columns = FALSE,
                     title = "")
draw(d)

# Create and draw a heatmap for pMM scores with alpha = 1
d <- distanceHeatmap(pMM1[row_order, row_order],
                     plot_labels = F,
                     cluster_rows = FALSE,
                     cluster_columns = FALSE,
                     title = "")
draw(d)

7.5 Word Clouds of Figure 38

# Calculate the Gene Ontology (GO) semantic distance matrix for the gene sets
goDistances <- goDistance(genesets$Genesets, species = "org.Mm.eg.db")

# Perform fuzzy clustering on the GO distance matrix.
fuzzy_clustering <- fuzzyClustering(seedFinding(goDistances, 0.5, 0.5), 0.5)

# Extract a cluster that represent T cell-related gene sets.
# The specific cluster indices were manually determined in a running `GeDi`
tCellCluster <- unique(unlist(fuzzy_clustering[c(7, 8, 14, 17, 20, 24, 28, 30, 45, 49, 50, 58, 62, 64, 65, 70, 72, 76, 81, 85, 92, 96, 103, 105, 106, 112, 114, 121, 124)]))

# Generate a word cloud visualization for the T cell-related gene sets.
enrichmentWordcloud(example_data[tCellCluster, ])
# Extract a cluster that are related to protein membrane processes.
# Again, specific cluster indices are manually selected.
proteinMembraneCluster <- unique(unlist(fuzzy_clustering[c(5, 6, 21, 23, 35, 43, 52, 54, 57, 69, 73, 80, 88, 102, 126)]))

# Generate a word cloud visualization for the protein membrane-related gene sets.
enrichmentWordcloud(example_data[proteinMembraneCluster, ])

7.6 Clustering Results of Figures 39 to 41

# Load precomputed distance matrices for the pMM and GO distance metrics
pMMDistances <- readRDS("pMM_Distances.RDS")
goDistances <- readRDS("GODistances.RDS")

# Apply Louvain clustering algorithm to the pMM distance matrix with clustering threshold 0.5
louvainpMM <- louvainClustering(pMMDistances, 0.5)

# Build a cluster graph from the Louvain clustering results for the pMM distances
g_louvainpMM <- buildClusterGraph(louvainpMM,
                                  genesets,
                                  genesets$Genesets,
                                  color_by = "Cluster")

# Visualize the Louvain clustering graph using the `visNetwork` package
visNetwork::visIgraph(g_louvainpMM) %>% 
  visNodes(color = list(border = "#545454")) %>%
  visEdges(color = list(border = "#545454"), width = 10)
# Apply Fuzzy clustering to the pMM distance matrix and
# visualisize the results afterwards
fuzzypMM <- fuzzyClustering(seedFinding(pMMDistances, 0.5, 0.5), 0.5)

g_fuzzypMM <- buildClusterGraph(fuzzypMM,
                                genesets,
                                genesets$Genesets,
                                color_by = "Cluster")

visNetwork::visIgraph(g_fuzzypMM) %>% 
  visNodes(color = list(border = "#545454")) %>%
  visEdges(color = list(border = "#545454"), width = 5)
# Apply Louvain clustering algorithm to the GO distance matrix
# Again, visualize the results afterwards
for(i in 1:100){
  louvainGO <- louvainClustering(goDistances, 0.5)
  print(paste("Round ", i, " Number Cluster ", length(louvainGO), sep = "" ))
}
## [1] "Round 1 Number Cluster 48"
## [1] "Round 2 Number Cluster 48"
## [1] "Round 3 Number Cluster 48"
## [1] "Round 4 Number Cluster 48"
## [1] "Round 5 Number Cluster 48"
## [1] "Round 6 Number Cluster 48"
## [1] "Round 7 Number Cluster 47"
## [1] "Round 8 Number Cluster 48"
## [1] "Round 9 Number Cluster 48"
## [1] "Round 10 Number Cluster 47"
## [1] "Round 11 Number Cluster 48"
## [1] "Round 12 Number Cluster 48"
## [1] "Round 13 Number Cluster 48"
## [1] "Round 14 Number Cluster 48"
## [1] "Round 15 Number Cluster 48"
## [1] "Round 16 Number Cluster 48"
## [1] "Round 17 Number Cluster 48"
## [1] "Round 18 Number Cluster 48"
## [1] "Round 19 Number Cluster 48"
## [1] "Round 20 Number Cluster 49"
## [1] "Round 21 Number Cluster 47"
## [1] "Round 22 Number Cluster 48"
## [1] "Round 23 Number Cluster 48"
## [1] "Round 24 Number Cluster 48"
## [1] "Round 25 Number Cluster 48"
## [1] "Round 26 Number Cluster 49"
## [1] "Round 27 Number Cluster 48"
## [1] "Round 28 Number Cluster 48"
## [1] "Round 29 Number Cluster 48"
## [1] "Round 30 Number Cluster 48"
## [1] "Round 31 Number Cluster 48"
## [1] "Round 32 Number Cluster 48"
## [1] "Round 33 Number Cluster 48"
## [1] "Round 34 Number Cluster 48"
## [1] "Round 35 Number Cluster 47"
## [1] "Round 36 Number Cluster 48"
## [1] "Round 37 Number Cluster 48"
## [1] "Round 38 Number Cluster 48"
## [1] "Round 39 Number Cluster 47"
## [1] "Round 40 Number Cluster 48"
## [1] "Round 41 Number Cluster 48"
## [1] "Round 42 Number Cluster 48"
## [1] "Round 43 Number Cluster 48"
## [1] "Round 44 Number Cluster 48"
## [1] "Round 45 Number Cluster 48"
## [1] "Round 46 Number Cluster 48"
## [1] "Round 47 Number Cluster 48"
## [1] "Round 48 Number Cluster 48"
## [1] "Round 49 Number Cluster 47"
## [1] "Round 50 Number Cluster 48"
## [1] "Round 51 Number Cluster 48"
## [1] "Round 52 Number Cluster 47"
## [1] "Round 53 Number Cluster 48"
## [1] "Round 54 Number Cluster 48"
## [1] "Round 55 Number Cluster 48"
## [1] "Round 56 Number Cluster 48"
## [1] "Round 57 Number Cluster 47"
## [1] "Round 58 Number Cluster 47"
## [1] "Round 59 Number Cluster 48"
## [1] "Round 60 Number Cluster 48"
## [1] "Round 61 Number Cluster 48"
## [1] "Round 62 Number Cluster 48"
## [1] "Round 63 Number Cluster 47"
## [1] "Round 64 Number Cluster 48"
## [1] "Round 65 Number Cluster 48"
## [1] "Round 66 Number Cluster 47"
## [1] "Round 67 Number Cluster 48"
## [1] "Round 68 Number Cluster 48"
## [1] "Round 69 Number Cluster 48"
## [1] "Round 70 Number Cluster 48"
## [1] "Round 71 Number Cluster 47"
## [1] "Round 72 Number Cluster 49"
## [1] "Round 73 Number Cluster 48"
## [1] "Round 74 Number Cluster 48"
## [1] "Round 75 Number Cluster 48"
## [1] "Round 76 Number Cluster 48"
## [1] "Round 77 Number Cluster 47"
## [1] "Round 78 Number Cluster 48"
## [1] "Round 79 Number Cluster 48"
## [1] "Round 80 Number Cluster 48"
## [1] "Round 81 Number Cluster 47"
## [1] "Round 82 Number Cluster 48"
## [1] "Round 83 Number Cluster 48"
## [1] "Round 84 Number Cluster 48"
## [1] "Round 85 Number Cluster 48"
## [1] "Round 86 Number Cluster 48"
## [1] "Round 87 Number Cluster 48"
## [1] "Round 88 Number Cluster 48"
## [1] "Round 89 Number Cluster 48"
## [1] "Round 90 Number Cluster 48"
## [1] "Round 91 Number Cluster 48"
## [1] "Round 92 Number Cluster 48"
## [1] "Round 93 Number Cluster 48"
## [1] "Round 94 Number Cluster 49"
## [1] "Round 95 Number Cluster 48"
## [1] "Round 96 Number Cluster 48"
## [1] "Round 97 Number Cluster 48"
## [1] "Round 98 Number Cluster 48"
## [1] "Round 99 Number Cluster 48"
## [1] "Round 100 Number Cluster 48"
louvainGO <- louvainClustering(goDistances, 0.5)

g_louvainGO <- buildClusterGraph(louvainGO,
                                 genesets,
                                 genesets$Genesets,
                                 color_by = "Cluster")

visNetwork::visIgraph(g_louvainGO) %>% 
  visNodes(color = list(border = "#545454")) %>%
  visEdges(color = list(border = "#545454"), width = 10)

Session information

sessionInfo()
## R Under development (unstable) (2024-03-12 r86109)
## Platform: x86_64-apple-darwin20
## Running under: macOS Ventura 13.2.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: Europe/Berlin
## tzcode source: internal
## 
## attached base packages:
## [1] grid      stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] visNetwork_2.1.2            forcats_1.0.0              
##  [3] ggplot2_3.5.1               ComplexHeatmap_2.22.0      
##  [5] org.Mm.eg.db_3.20.0         GeDi_1.2.0                 
##  [7] readxl_1.4.3                dplyr_1.1.4                
##  [9] apeglm_1.28.0               GeneTonic_3.0.0            
## [11] ideal_2.0.0                 pcaExplorer_3.0.0          
## [13] topGO_2.58.0                SparseM_1.84-2             
## [15] GO.db_3.20.0                AnnotationDbi_1.68.0       
## [17] graph_1.84.0                DESeq2_1.46.0              
## [19] SummarizedExperiment_1.36.0 Biobase_2.66.0             
## [21] MatrixGenerics_1.18.0       matrixStats_1.4.1          
## [23] GenomicRanges_1.58.0        GenomeInfoDb_1.42.0        
## [25] IRanges_2.40.0              S4Vectors_0.44.0           
## [27] BiocGenerics_0.52.0         knitr_1.48                 
## 
## loaded via a namespace (and not attached):
##   [1] R.methodsS3_1.8.2        GSEABase_1.68.0          progress_1.2.3          
##   [4] wordcloud2_0.2.1         DT_0.33                  Biostrings_2.74.0       
##   [7] vctrs_0.6.5              ggtangle_0.0.4           digest_0.6.37           
##  [10] png_0.1-8                shape_1.4.6.1            shinyBS_0.61.1          
##  [13] registry_0.5-1           ggrepel_0.9.6            magick_2.8.5            
##  [16] MASS_7.3-61              reshape2_1.4.4           httpuv_1.6.15           
##  [19] foreach_1.5.2            qvalue_2.38.0            withr_3.0.2             
##  [22] xfun_0.49                ggfun_0.1.7              survival_3.7-0          
##  [25] memoise_2.0.1            clusterProfiler_4.14.0   gson_0.1.0              
##  [28] BiasedUrn_2.0.12         tidytree_0.4.6           GlobalOptions_0.1.2     
##  [31] gtools_3.9.5             R.oo_1.26.0              prettyunits_1.2.0       
##  [34] KEGGREST_1.46.0          promises_1.3.0           httr_1.4.7              
##  [37] restfulr_0.0.15          hash_2.2.6.3             rstudioapi_0.17.1       
##  [40] shinyAce_0.4.3           UCSC.utils_1.2.0         miniUI_0.1.1.1          
##  [43] generics_0.1.3           DOSE_4.0.0               base64enc_0.1-3         
##  [46] curl_5.2.3               zlibbioc_1.52.0          polyclip_1.10-7         
##  [49] ca_0.71.1                GenomeInfoDbData_1.2.13  SparseArray_1.6.0       
##  [52] RBGL_1.82.0              threejs_0.3.3            xtable_1.8-4            
##  [55] stringr_1.5.1            doParallel_1.0.17        evaluate_1.0.1          
##  [58] S4Arrays_1.6.0           BiocFileCache_2.14.0     hms_1.1.3               
##  [61] bookdown_0.41            colorspace_2.1-1         filelock_1.0.3          
##  [64] NLP_0.3-0                shinyWidgets_0.8.7       magrittr_2.0.3          
##  [67] Rgraphviz_2.50.0         later_1.3.2              viridis_0.6.5           
##  [70] ggtree_3.14.0            lattice_0.22-6           NMF_0.28                
##  [73] genefilter_1.88.0        XML_3.99-0.17            cowplot_1.1.3           
##  [76] pillar_1.9.0             nlme_3.1-166             iterators_1.0.14        
##  [79] gridBase_0.4-7           caTools_1.18.3           compiler_4.4.0          
##  [82] stringi_1.8.4            shinycssloaders_1.1.0    Category_2.72.0         
##  [85] TSP_1.2-4                dendextend_1.18.1        GenomicAlignments_1.42.0
##  [88] plyr_1.8.9               crayon_1.5.3             abind_1.4-8             
##  [91] BiocIO_1.16.0            ggdendro_0.2.0           gridGraphics_0.5-1      
##  [94] emdbook_1.3.13           chron_2.3-61             locfit_1.5-9.10         
##  [97] bit_4.5.0                UpSetR_1.4.0             fastmatch_1.1-4         
## [100] codetools_0.2-20         crosstalk_1.2.1          bslib_0.8.0             
## [103] slam_0.1-54              GetoptLong_1.0.5         tm_0.7-14               
## [106] plotly_4.10.4            mime_0.12                mosdef_1.2.0            
## [109] splines_4.4.0            circlize_0.4.16          Rcpp_1.0.13             
## [112] dbplyr_2.5.0             tippy_0.1.0              cellranger_1.1.0        
## [115] blob_1.2.4               utf8_1.2.4               clue_0.3-65             
## [118] fs_1.6.5                 backbone_2.1.4           IHW_1.34.0              
## [121] expm_1.0-0               ggplotify_0.1.2          sqldf_0.4-11            
## [124] tibble_3.2.1             Matrix_1.7-1             statmod_1.5.0           
## [127] tweenr_2.0.3             pkgconfig_2.0.3          pheatmap_1.0.12         
## [130] tools_4.4.0              cachem_1.1.0             RSQLite_2.3.7           
## [133] viridisLite_0.4.2        DBI_1.2.3                numDeriv_2016.8-1.1     
## [136] fastmap_1.2.0            rmarkdown_2.28           scales_1.3.0            
## [139] shinydashboard_0.7.2     Rsamtools_2.22.0         sass_0.4.9              
## [142] patchwork_1.3.0          coda_0.19-4.1            BiocManager_1.30.25     
## [145] fontawesome_0.5.2        farver_2.1.2             mgcv_1.9-1              
## [148] gsubfn_0.7               yaml_2.3.10              AnnotationForge_1.48.0  
## [151] rtracklayer_1.66.0       cli_3.6.3                purrr_1.0.2             
## [154] txdbmaker_1.2.0          webshot_0.5.5            lifecycle_1.0.4         
## [157] mvtnorm_1.3-1            rintrojs_0.3.4           BiocParallel_1.40.0     
## [160] annotate_1.84.0          gtable_0.3.6             rjson_0.2.23            
## [163] ggridges_0.5.6           parallel_4.4.0           ape_5.8                 
## [166] limma_3.62.0             jsonlite_1.8.9           colourpicker_1.3.0      
## [169] seriation_1.5.6          bitops_1.0-9             bit64_4.5.2             
## [172] assertthat_0.2.1         yulab.utils_0.1.7        BiocNeighbors_2.0.0     
## [175] proto_1.0.0              heatmaply_1.5.0          geneLenDataBase_1.42.0  
## [178] bs4Dash_2.3.4            bdsmatrix_1.3-7          highr_0.11              
## [181] jquerylib_0.1.4          GOSemSim_2.32.0          R.utils_2.12.3          
## [184] lazyeval_0.2.2           shiny_1.9.1              dynamicTreeCut_1.63-1   
## [187] htmltools_0.5.8.1        enrichplot_1.26.1        rappdirs_0.3.3          
## [190] STRINGdb_2.18.0          glue_1.8.0               httr2_1.0.5             
## [193] XVector_0.46.0           RCurl_1.98-1.16          treeio_1.30.0           
## [196] ComplexUpset_1.3.3       gridExtra_2.3            igraph_2.1.1            
## [199] R6_2.5.1                 tidyr_1.3.1              gplots_3.2.0            
## [202] fdrtool_1.2.18           labeling_0.4.3           GenomicFeatures_1.58.0  
## [205] cluster_2.1.6            rngtools_1.5.2           bbmle_1.0.25.1          
## [208] aplot_0.2.3              plotrix_3.8-4            DelayedArray_0.32.0     
## [211] tidyselect_1.2.1         GOstats_2.72.0           ggforce_0.4.2           
## [214] xml2_1.3.6               munsell_0.5.1            KernSmooth_2.23-24      
## [217] goseq_1.58.0             BiocStyle_2.34.0         data.table_1.16.2       
## [220] htmlwidgets_1.6.4        fgsea_1.32.0             RColorBrewer_1.1-3      
## [223] biomaRt_2.62.0           rlang_1.1.4              rentrez_1.2.3           
## [226] lpsymphony_1.34.0        Cairo_1.6-2              fansi_1.0.6

Bibliography

Delacher, Michael, Charles D. Imbusch, Agnes Hotz-Wagenblatt, Jan Philipp Mallm, Katharina Bauer, Malte Simon, Dania Riegel, et al. 2020. Precursors for Nonlymphoid-Tissue Treg Cells Reside in Secondary Lymphoid Organs and Are Programmed by the Transcription Factor BATF.” Immunity 52 (2): 295–312.e11. https://doi.org/10.1016/j.immuni.2019.12.002.
Ludt, Annekathrin, Arsenij Ustjanzew, Harald Binder, Konstantin Strauch, and Federico Marini. 2022. Interactive and Reproducible Workflows for Exploring and Modeling RNA-seq Data with pcaExplorer, Ideal, and GeneTonic.” Current Protocols 2 (4): 1–55. https://doi.org/10.1002/cpz1.411.
---
title: >
  Enhancing analysis and interpretation workflows for transcriptome data with an interactive R/Bioconductor toolkit
author:
- name: Annekathrin Silvia Nedwed
  affiliation: 
  - &id1 Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), Mainz<br>
  email: anneludt@uni-mainz.de
date: "`r BiocStyle::doc_date()`"
package: "`r BiocStyle::pkg_ver('GeDi')`"
output: 
  bookdown::html_document2:
    toc: true
    toc_float: true
    theme: cosmo
    code_folding: show
    code_download: true
editor_options: 
  chunk_output_type: console
link-citations: true
bibliography: "thesis_supplement.bib"
---

```{r setup, include=FALSE, cache=FALSE, eval = TRUE, echo = FALSE}
library("knitr")
opts_chunk$set(
  fig.align = "center",
  fig.show = "asis",
  eval = TRUE,
  fig.width = 10,
  fig.height = 7,
  tidy = FALSE,
  message = FALSE,
  warning = FALSE,
  size = "small",
  comment = "##",
  echo = TRUE,
  results = "markup"
)
options(replace.assign = TRUE, width = 80)
```


This document accompanies my PhD thesis titled "Enhancing analysis and interpretation workflows for transcriptome data with an interactive R Bioconductor toolkit". The document will follow all the analysis and steps documented in my PhD thesis in Section 3.3.

# About the data

The data illustrated in this document is documented in detail in Section 3.3.1 of my PhD thesis. 

The datasets shown is an RNA-seq dataset, available at the Gene Expression Omnibus under the accession code [GSE130842](https://www.ncbi.xyz/geo/query/acc.cgi?acc=GSE130842).

The data represents a mouse data set of 56 different samples isolated from various murine tissues. In this analysis, I will focus in the Klrg1-negative Nfil3(GFP)-negative Tregs and Klrg1-positive Nfil3(GFP)-positive Tregs samples as discussed with the original authors of the manuscript [@Delacher2020]. To analyse the data, the count data was downloaded from the Gene Expression Omnibus (https://www.ncbi.xyz/geo/query/acc.cgi?acc=GSE130842). The count data can be found in file "GSE130842_Count_table_Delacher_et_al_2019.xlsx". Additionally, a metadata file was set up, containing the sample names, conditions and replicate numbers following the information of the GEO entry. This is the file named "GSE130842_Count_table_Delacher_et_al_2019.csv"

The workflow shown in this document will follow the steps described in our manuscript "Interactive and Reproducible Workflows for Exploring and Modeling RNA-seq Data with pcaExplorer, Ideal, and GeneTonic" [@Ludt2022].

# Loading required packages

First, load the packages required to perform all the analytic steps presented in this document.

```{r loadLibraries, results='hide'}
library("DESeq2")
library("topGO")
library("pcaExplorer")
library("ideal")
library("GeneTonic")
library("apeglm")
library("dplyr")
library("readxl")
library("GeDi")
library("org.Mm.eg.db")
library("utils")
library("ComplexHeatmap")
library("readxl")
library("ggplot2")
library("dplyr")
library("forcats")
library("visNetwork")
```

# Data processing

First, the data will be preprocessed and saved as an `DESeqDataSet` object, as described in Section 3.3.1 of my thesis.

```{r create_dds}
# Read the gene expression count data from an Excel file
# The Excel file contains the raw count table for RNA-seq data
data <- readxl::read_excel("GSE130842_Count_table_Delacher_et_al_2019.xlsx")

# Extract the gene names from the first column of the dataset
gene_names <- data$ID  # 'ID' column contains gene names

# Convert the remaining columns (gene counts) to a matrix
# Skip the first column which contains gene names
data <- as.matrix(data[, -1])

# Assign gene names as row names to the count matrix
rownames(data) <- gene_names

# Subset the data for specific columns (samples)
# Columns 33:36 and 41:44 represent selected samples for analysis
data <- data[, c(33:36, 41:44)]

# Read the metadata file that contains information about the samples
# Metadata contains columns describing conditions and replicates
metadata <- utils::read.csv("GSE130842_Metadata.csv") 

# Subset relevant columns from the metadata (conditions and replicate number)
metadata <- metadata[, c(2:3)]

# Create a DESeq2 dataset object from the count matrix and metadata
# This dataset will be used for differential expression analysis
# The design formula (~condition) specifies that the analysis should model the 'condition' variable
dds <- DESeqDataSetFromMatrix(countData = data, 
                              colData = metadata,
                              design = ~condition)

# Estimate the size factors for normalization of the data
# Size factors are used to normalize the sequencing depth across samples
dds <- estimateSizeFactors(dds)
```

# Exploratory Data Analysis using pcaExplorer 


```{r annodf}
# Retrieve gene annotation using the OrgDb database for *Mus musculus* (mouse)
# The annotation is matched using the "ENSEMBL" identifier.
anno_df <- pcaExplorer::get_annotation_orgdb(dds, "org.Mm.eg.db", "ENSEMBL")
```


```{r pcaExplorer, eval = F}
# Launch the PCAExplorer application to interactively explore principal component analysis (PCA) results.
# The function `pcaExplorer()` is called with the following inputs:
# - dds: A DESeq2 dataset object containing count data and experimental metadata
# - annotation: The annotation data frame (anno_df) containing gene annotations for better interpretation
# This opens a Shiny app that allows for interactive exploration of PCA and downstream analyses.
pcaExplorer(dds = dds, annotation = anno_df)
```


# Differential Gene Expression Analysis using ideal and GeneTonic


```{r de-ideal, eval = F}
# Launches the iDEAL Shiny app for interactive DE analysis
# - dds_obj: DESeq2 dataset object used for differential expression analysis
# - annotation_obj: Annotation object containing gene annotations (from `get_annotation_orgdb`)
# The `ideal()` function opens the interactive DE analysis tool that allows for visualization, interpretation, and exploration of DE results.
ideal(dds_obj = dds, annotation_obj = anno_df)
```



```{r de-GeneTonic}
# Reading in the differential expression (DE) gene results from a pre-generated CSV file
# This file contains the DE genes and their associated statistics
de_genes <- read.csv("table_DE_results_ideal.csv")

# Converting the data into a DESeqResults object, which stores DE analysis results
# - res_de: Contains the DE genes and their statistics for downstream analysis
res_de <- DESeqResults(de_genes)
rownames(res_de) <- res_de$X  # Assign rownames for easy indexing later
colnames(res_de)[colnames(res_de) == "symbol"] <- "SYMBOL"

# Reading the functional enrichment results from a CSV file (TopGO results)
# Limiting the results to the top 500 enriched terms for further processing
topGOResults <- read.csv("table_Functional_enrichment_Results.csv")
topGOResults <- topGOResults[c(1:500),]

# This function processes the TopGO results to ensure the format is compatible
# with downstream analysis functions in `GeneTonic`.
# `shake_topGOtableResult` modifies or reorders the table as needed.
topGOResults <- shake_topGOtableResult(topGOResults)

# Aggregating scores for the gene sets and the DE genes, linking DE results with functional enrichment.
# The `get_aggrscores` function calculates aggregated enrichment scores by combining the topGO results
# and DE results (res_de) and integrating gene annotations from the annotation object (anno_df).
res_enrich <- get_aggrscores(topGOResults,
                             res_de,
                             anno_df)

# Creating a GeneTonicList (gtl) object, which consolidates all required input data for `GeneTonic`:
# - dds: DESeq2 dataset object
# - res_de: Differential expression results object
# - res_enrich: Functional enrichment scores calculated from `get_aggrscores`
# - anno_df: Gene annotation information
# The gtl object is necessary for launching the `GeneTonic()` app.
gtl <- GeneTonicList(dds, 
                     res_de, 
                     res_enrich, 
                     anno_df)
```

```{r LaunchGeneTonic, eval = F}
# Launch the `GeneTonic()` Shiny app, which provides an interactive platform for 
# exploring and visualizing the results of differential expression and functional enrichment analysis.
GeneTonic(gtl = gtl)
```


# Functional Enrichment Result Interpretation using GeDi


```{r GeDi, eval = F}
GeDi::GeDi(gtl = gtl)
```


# Figure Generation

This part will document the code for some of the figures included in my thesis. 

```{r}
# Extract the gene sets from the 'gtl' object, specifically from the enrichment results
genesets <- gtl$res_enrich

# Rename the columns in the 'genesets' data frame so that it 
# can be used directly in `GeDi`
names(genesets)[names(genesets) == "gs_id"] <- "Genesets"
names(genesets)[names(genesets) == "gs_genes"] <- "Genes"
names(genesets)[names(genesets) == "gs_description"] <- "Term"

# Prepare the gene sets for further analysis by extracting and formatting the gene information
genes <- prepareGenesetData(genesets)

# Apply a filter to exclude gene sets that have 200 or more genes
filter <- sapply(genes, function(x) length(x) >= 200)

# Remove the filtered out, large gene sets from the 'genesets' data frame
genesets <- genesets[!filter, ]

# Prepare the filtered gene sets for further analysis by reapplying the data preparation function
genes <- prepareGenesetData(genesets)
```


## Dendrogram Example of Figure 8 and Network Example of Figure 9

```{r}
# Create a copy of the 'genesets' data frame to use as example data for further analysis
example_data <- genesets

# Filter the 'example_data' to include only specific gene sets, identified by their GO terms
example_data <-
  example_data[example_data$Genesets %in% c("GO:0051988", "GO:0051987", "GO:0051315", "GO:0007094", "GO:1901970", "GO:1905820", "GO:0051984", "GO:0090267", "GO:0051255",
 "GO:1902412", "GO:0051231"),]

# Prepare the filtered gene sets for further analysis by extracting the gene information
example_genes <- prepareGenesetData(example_data)

# Calculate the Jaccard distance scores for the selected gene sets
jaccard_scores <- getJaccardMatrix(example_genes)

# Set the row and column names of the distance matrix to be the names of the gene sets
rownames(jaccard_scores) <- colnames(jaccard_scores) <- example_data$Genesets

# Generate a dendrogram 
d <- distanceDendro(jaccard_scores)

# Increase the text size of the abscissa labels for better readability and adjust the line width
d <- d + 
  theme(axis.text.x = element_text(size = 10)) +  
  geom_path(linewidth = 5)  

# Display the customized dendrogram plot
d

# Build a graph object from the adjacency matrix derived from the Jaccard scores,
# using a threshold of 0.67 to determine similarity between gene sets
g <- buildGraph(getAdjacencyMatrix(jaccard_scores, 0.67))

# Use 'visNetwork' to visualize the graph,
# set node color to blue and node border and edge color to black
visNetwork::visIgraph(g) %>%
  visNodes(color = list(
            background = "#0092AC",
            border = "#545454" ), 
           font = list(size = 20)) %>% 
  visEdges(width = 3) 
```


## Graph Network Examples of Figure 10

```{r}
# Create a copy of the 'genesets' data to be used for further analysis and prepare data
example_data <- genesets
example_genes <- prepareGenesetData(example_data)

# Calculate the Jaccard distance matrix for the gene sets
jaccard_scores <- getJaccardMatrix(example_genes)

# Set the row and column names of the Jaccard distance matrix to be the names of the gene sets
rownames(jaccard_scores) <- colnames(jaccard_scores) <- example_data$Genesets

# Perform Louvain clustering on the Jaccard distance matrix with a threshold of 0.4
louvain <- louvainClustering(jaccard_scores, 0.4)

# Perform Fuzzy clustering on the gene sets 
# Seed clusters are identified with thresholds 0.65 and 0.3, and the final clustering is done with a threshold of 0.5
fuzzy <- fuzzyClustering(seedFinding(jaccard_scores, 0.65, 0.3), 0.5)

# Build a graph object for the Louvain and Fuzzy clustering results
g_Louvain <- buildClusterGraph(louvain, 
                               example_data,
                               example_data$Genesets,
                               color_by = "Cluster")

g_Fuzzy <- buildClusterGraph(fuzzy, 
                             example_data,
                             example_data$Genesets,
                             color_by = "Cluster")

# Visualize the Louvain and Fuzzy clustering graph 
visNetwork::visIgraph(g_Louvain) %>% 
  visNodes(color = list(border = "#545454")) %>% 
  visEdges(color = list(border = "#545454"), width = 10)

visNetwork::visIgraph(g_Fuzzy) %>% 
  visNodes(color = list(border = "#545454")) %>% 
  visEdges(color = list(border = "#545454"), width = 10) 
```

## Literature Review of Figures 16 to 18

### Read in the data

```{r}
paper_20 <- read_xlsx("LiteratureReviewResults.xlsx", 
                      sheet = "First search, 20 paper")

paper_33 <- read_xlsx("LiteratureReviewResults.xlsx", 
                      sheet = "Second search, 33 paper")

paper_44 <- read_xlsx("LiteratureReviewResults.xlsx", 
                      sheet = "Third search, 44 paper")
```

### Plots on the details on the enrichment analysis

```{r}
enrich20 <- data.frame(table(paper_20$`Degree of Detail on the Enrichment Method`)) 
colnames(enrich20) <- c("Detail", "Frequency")
ggplot(enrich20,aes(x = reorder(Detail,Frequency), y = Frequency, fill = Detail)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#93AA00", "#F8766D", "#D39200", "darkgrey"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Degree of Detail on the enrichment methods") +
  theme_bw(base_size = 18) +
  coord_flip() + 
  geom_text(
    aes(x = Detail, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 13))


enrich33 <- data.frame(table(paper_33$`Degree of Detail on the Enrichment Method`)) 
colnames(enrich33) <- c("Detail", "Frequency")
ggplot(enrich33,aes(x = reorder(Detail,Frequency), y = Frequency, fill = Detail)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#93AA00", "#F8766D", "#D39200", "darkgrey"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Degree of Detail on the enrichment methods") +
  theme_bw(base_size = 18) +
  coord_flip() + 
    geom_text(
    aes(x = Detail, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 16))

enrich44 <- data.frame(table(paper_44$`Degree of Detail on the Enrichment Method`)) 
colnames(enrich44) <- c("Detail", "Frequency")
ggplot(enrich44,aes(x = reorder(Detail,Frequency), y = Frequency, fill = Detail)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c( "#93AA00","#D39200", "darkgrey"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Degree of Detail on the enrichment methods") +
  theme_bw(base_size = 18) +
  coord_flip() + 
    geom_text(
    aes(x = Detail, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))

```

### Gene Set Libraries Used

```{r}
gs_Libraries20 <- paper_20$`Gene Set Library used`
gs_Libraries20 <- unlist(strsplit(gs_Libraries20, ",\\s*"))
library_20 <- data.frame(table(gs_Libraries20)) 
colnames(library_20) <- c("Library", "Frequency")
library_20$Library <- as.factor(library_20$Library)

ggplot(library_20, aes(x = reorder(Library,Frequency), y = Frequency, fill = Library)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#00C19F", "#F8766D", "#00B9E3",
                               "#D39200","darkgrey", "#93AA00"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Gene Set Library used") +
  theme_bw(base_size = 18) + 
  coord_flip() + 
    geom_text(
    aes(x = Library, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 15))

gs_Libraries33 <- paper_33$`Gene Set Library used`
gs_Libraries33 <- unlist(strsplit(gs_Libraries33, ",\\s*"))
library_33 <- data.frame(table(gs_Libraries33)) 
colnames(library_33) <- c("Library", "Frequency")
library_33$Library <- as.factor(library_33$Library)

ggplot(library_33, aes(x = reorder(Library,Frequency), y = Frequency, fill = Library)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#93AA00", "#00C19F", "#00B9E3",
                               "#F8766D", "darkgrey","#D39200"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Gene Set Library used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
     geom_text(
    aes(x = Library, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 15))

gs_Libraries44 <- paper_44$`Gene Set Library used`
gs_Libraries44 <- unlist(strsplit(gs_Libraries44, ",\\s*"))
library_44 <- data.frame(table(gs_Libraries44)) 
colnames(library_44) <- c("Library", "Frequency")
library_44$Library <- as.factor(library_44$Library)

ggplot(library_44, aes(x = reorder(Library,Frequency), y = Frequency, fill = Library)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "#D39200", "#FF65AC", "#93AA00", "#00B9E3",  
                               "#00BA38", "darkgrey","#00C19F" ),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Gene Set Library used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
     geom_text(
    aes(x = Library, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))

```


### Software used

```{r}
software20 <- paper_20$`Software used`
software20 <- unlist(strsplit(software20, ",\\s*"))
software20 <- data.frame(table(software20)) 
colnames(software20) <- c("Software", "Frequency")

ggplot(software20,aes(x = reorder(Software,Frequency), y = Frequency, fill = Software)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#FF65AC", "#F8766D", "#E18A00", "#BE9C00", "#8CAB00",
                               "#00ACFC", "#D575FE", "#24B700", "darkgrey","lightgrey",
                               "#00BE70", "#00C1AB", "#00BBDA"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Software used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
     geom_text(
    aes(x = Software, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  )

software33 <- paper_33$`Software used`
software33 <- unlist(strsplit(software33, ",\\s*"))
software33 <- data.frame(table(software33)) 
colnames(software33) <- c("Software", "Frequency")

ggplot(software33,aes(x = reorder(Software,Frequency), y = Frequency, fill = Software)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#00BE70", "#D575FE", "#00C1AB", "#00BBDA","#F8766D",
                               "#E18A00", "#00ACFC", "darkgrey",  "lightgrey","#BE9C00",
                               "#8CAB00", "#24B700"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Software used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
       geom_text(
    aes(x = Software, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 15))

software44 <- paper_44$`Software used`
software44 <- unlist(strsplit(software44, ",\\s*"))
software44 <- data.frame(table(software44)) 
colnames(software44) <- c("Software", "Frequency")

ggplot(software44,aes(x = reorder(Software,Frequency), y = Frequency, fill = Software)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#FF65AC", "#F8766D", "#00C1AB", "#E18A00", "#BE9C00",
                               "#D575FE", "#00BBDA", "#00ACFC", "darkgrey", "lightgrey", 
                               "#8CAB00", "#24B700","#00BE70"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Software used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
       geom_text(
    aes(x = Software, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))

```

### Function used

```{r}
function20 <- data.frame(table(paper_20$`Statistical Test or Function`))
colnames(function20) <- c("Test", "Frequency")

ggplot(function20,aes(x = reorder(Test,Frequency), y = Frequency, fill = Test)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "darkgrey", "lightgrey"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Test/Function used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
       geom_text(
    aes(x = Test, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) + 
  ylim(c(0, 16))

function33 <- data.frame(table(paper_33$`Statistical Test or Function`))
colnames(function33) <- c("Test", "Frequency")

ggplot(function33,aes(x = reorder(Test,Frequency), y = Frequency, fill = Test)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "#E18A00", "darkgrey", "lightgrey"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Test/Function used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
         geom_text(
    aes(x = Test, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 19))


function44 <- paper_44$`Statistical Test or Function`
function44 <- unlist(strsplit(function44, ",\\s*"))
function44 <- data.frame(table(function44)) 
colnames(function44) <- c("Test", "Frequency")

ggplot(function44,aes(x = reorder(Test,Frequency), y = Frequency, fill = Test)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "#8CAB00", "#E18A00", "#00ACFC",
                               "darkgrey", "lightgrey", "#BE9C00"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Test/Function used") +
  theme_bw(base_size = 18) + 
  coord_flip() +
         geom_text(
    aes(x = Test, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))
```



### Summary as a Table

```{r}
table20 <- data.frame(table(paper_20$`Table of Results provided`))
colnames(table20) <- c("Table", "Frequency")
ggplot(table20,aes(x = reorder(Table,Frequency), y = Frequency, fill = Table)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c( "#D39200","darkgrey", "#93AA00"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Results presented as Table") +
  theme_bw(base_size = 18) +
  coord_flip() +
         geom_text(
    aes(x = Table, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  )

table33 <- data.frame(table(paper_33$`Table of Results provided`))
colnames(table33) <- c("Table", "Frequency")
ggplot(table33,aes(x = reorder(Table,Frequency), y = Frequency, fill = Table)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c( "#D39200","darkgrey", "#93AA00"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Results presented as Table") +
  theme_bw(base_size = 18) +
  coord_flip() +
           geom_text(
    aes(x = Table, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 15))

table44 <- data.frame(table(paper_44$`Table of Results provided`))
colnames(table44) <- c("Table", "Frequency")
ggplot(table44,aes(x = reorder(Table,Frequency), y = Frequency, fill = Table)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c( "#D39200","darkgrey", "#93AA00"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Results presented as Table") +
  theme_bw(base_size = 18) +
  coord_flip() +
           geom_text(
    aes(x = Table, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))

```

### Summary presented visually

```{r}
visual20 <- paper_20$Visualisation
visual20 <- unlist(strsplit(visual20, ",\\s*"))
visual20 <- data.frame(table(visual20)) 
colnames(visual20) <- c("Visualization", "Frequency")

ggplot(visual20,aes(x = reorder(Visualization,Frequency), y = Frequency, fill = Visualization)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "#E18A00", "#FF65AC", "#BE9C00", "#00BBDA",
                               "#8CAB00", "#00ACFC", "#24B700", "darkgrey",
                                "#00C1AB" ),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Type of Visualization") +
  theme_bw(base_size = 18) + 
  coord_flip() +
           geom_text(
    aes(x = Visualization, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 14))

visual33 <- paper_33$Visualisation
visual33 <- unlist(strsplit(visual33, ",\\s*"))
visual33 <- data.frame(table(visual33)) 
colnames(visual33) <- c("Visualization", "Frequency")

ggplot(visual33,aes(x = reorder(Visualization,Frequency), y = Frequency, fill = Visualization)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#00ACFC", "#F8766D", "#00BBDA", "#E18A00", "#BE9C00",
                               "#FF65AC", "#8CAB00", "#D575FE" , "#24B700", "#8B93FF",
                               "darkgrey", "#00BE70", "#00C1AB"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Type of Visualization") +
  theme_bw(base_size = 18) + 
  coord_flip() +
    geom_text(
    aes(x = Visualization, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 15))

visual44 <- paper_44$Visualisation
visual44 <- unlist(strsplit(visual44, ",\\s*"))
visual44 <- data.frame(table(visual44)) 
colnames(visual44) <- c("Visualization", "Frequency")

ggplot(visual44,aes(x = reorder(Visualization,Frequency), y = Frequency, fill = Visualization)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "#00ACFC", "#FF65AC", "#00BE70", 
                               "#D575FE", "#E18A00", "#BE9C00", "#00C1AB",
                               "#00BBDA", "darkgrey", "#8CAB00" ),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Type of Visualization") +
  theme_bw(base_size = 18) + 
  coord_flip() +
    geom_text(
    aes(x = Visualization, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))

```

### Aggregation of Results

```{r}
aggregation20 <- data.frame(table(paper_20$`Aggregation of Results`))
colnames(aggregation20) <- c("Aggregation", "Frequency")

ggplot(aggregation20,aes(x = reorder(Aggregation,Frequency),
                   y = Frequency, fill = Aggregation)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c( "#93AA00", "darkgrey", "#D39200"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Aggregation on the Results performed") +
  theme_bw(base_size = 18) +
  coord_flip() +
    geom_text(
    aes(x = Aggregation, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 15))

aggregation33 <- data.frame(table(paper_33$`Aggregation of Results`))
colnames(aggregation33) <- c("Aggregation", "Frequency")

ggplot(aggregation33,aes(x = reorder(Aggregation,Frequency),
                   y = Frequency, fill = Aggregation)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#93AA00", "darkgrey", "#D39200"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Aggregation on the Results performed") +
  theme_bw(base_size = 18) +
  coord_flip() +
  geom_text(
    aes(x = Aggregation, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 15))

aggregation44 <- data.frame(table(paper_44$`Aggregation of Results`))
colnames(aggregation44) <- c("Aggregation", "Frequency")

ggplot(aggregation44,aes(x = reorder(Aggregation,Frequency),
                   y = Frequency, fill = Aggregation)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#93AA00", "darkgrey", "#D39200"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Aggregation on the Results performed") +
  theme_bw(base_size = 18) +
  coord_flip() +
  geom_text(
    aes(x = Aggregation, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))
```

### Selection of specific Gene Sets

```{r}
selection20 <- data.frame(table(paper_20$`Selection of Gene Sets`))
colnames(selection20) <- c("Selection", "Frequency")

ggplot(selection20,aes(x = reorder(Selection,Frequency), y = Frequency, fill = Selection)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("darkgrey", "#F8766D"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Only selected Gene Sets presented") +
  theme_bw(base_size = 18) + 
  coord_flip() +
  geom_text(
    aes(x = Selection, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 17))

selection33 <- data.frame(table(paper_33$`Selection of Gene Sets`))
colnames(selection33) <- c("Selection", "Frequency")

ggplot(selection33,aes(x = reorder(Selection,Frequency), y = Frequency, fill = Selection)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("darkgrey", "#F8766D"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Only selected Gene Sets presented") +
  theme_bw(base_size = 18) + 
  coord_flip() +
    geom_text(
    aes(x = Selection, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 21))

selection44 <- data.frame(table(paper_44$`Selection of Gene Sets`))
colnames(selection44) <- c("Selection", "Frequency")

ggplot(selection44,aes(x = reorder(Selection,Frequency), y = Frequency, fill = Selection)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c( "darkgrey", "#F8766D"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Only selected Gene Sets presented") +
  theme_bw(base_size = 18) + 
  coord_flip() +
    geom_text(
    aes(x = Selection, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0, 25))

```

### Reason for Selection

```{r}
justification20 <- data.frame(table(paper_20$`Justification of Selection`))
colnames(justification20) <- c("Justification", "Frequency")

ggplot(justification20,aes(x = reorder(Justification,Frequency), y = Frequency, fill = Justification)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("Top" = "#FF65AC", "Not applicable" = "darkgrey",
                      "Not stated" = "lightgrey", "Rich Factor" =  "#00BBDA",
                      "p value" = "#8CAB00", "Number of DEG" =  "#BE9C00", 
                      "Literature" = "#E18A00", "Highest Enrichment" = "#F8766D"),
    
  
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Justification for Selection") +
  theme_bw(base_size = 18) + 
  coord_flip() +
    geom_text(
    aes(x = Justification, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  )

justification33 <- data.frame(table(paper_33$`Justification of Selection`))
colnames(justification33) <- c("Justification", "Frequency")

ggplot(justification33,aes(x = reorder(Justification,Frequency), y = Frequency, fill = Justification)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "darkgrey", "lightgrey",
                               "#E18A00", "#8CAB00"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Justification for Selection") +
  theme_bw(base_size = 18) + 
  coord_flip() +
    geom_text(
    aes(x = Justification, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) +
  ylim(c(0,15))

justification44 <- paper_44$`Justification of Selection`
justification44 <- unlist(strsplit(justification44, ",\\s*"))
justification44 <- data.frame(table(justification44))
colnames(justification44) <- c("Justification", "Frequency")

ggplot(justification44,aes(x = reorder(Justification,Frequency), y = Frequency, fill = Justification)) +
  geom_bar(stat ="identity") +
  scale_fill_manual(values = c("#F8766D", "#E18A00", "#FF65AC", "darkgrey",
                               "lightgrey", "#00ACFC", "#BE9C00", "#8CAB00"),
                    aesthetics = "fill",
                    guide = F) +
  labs(x = "", title = "Justification for Selection") +
  theme_bw(base_size = 18) + 
  coord_flip() +
   geom_text(
    aes(x = Justification, y = Frequency, label = Frequency), 
    hjust = -0.5, size = 5,
    position = position_dodge(width = 1),
    inherit.aes = TRUE
  ) + 
  ylim(c(0, 25))

```

### Summary Figure

```{r}
columns_to_check <- c(
  "Degree of Detail on the Enrichment Method",
  "Gene Set Library used",
  "Statistical Test or Function",
  "Software used",
  "Table of Results provided",
  "Visualisation",
  "Aggregation of Results",
  "Selection of Gene Sets",
  "Justification of Selection"
)

checkPaperScore <- function(paper, columns){
  n_paper <- nrow(paper)
  summery_score <- c(rep(0, n_paper))
  
  for(i in 1:n_paper) {
  entry <- paper[i, ]
  score <- 0
  
  # Check each column for "Not stated" or "Not applicable"
  for (col in columns_to_check) {
    if(col != "Degree of Detail on the Enrichment Method"){
      if(!is.null(entry[[col]]) && !(entry[[col]] %in% c("Not stated", "Not applicable", "FALSE"))) {
        score <- score + 1
      }
    }else{
      if(!is.null(entry[[col]]) && entry[[col]] == "Detailed") {
        score <- score + 1
      }
    }
    }
  summery_score[i] <- score
  }
  return(summery_score)
}


summery_score_20 <- checkPaperScore(paper_20, columns_to_check)

summery_score_33 <- checkPaperScore(paper_33, columns_to_check)

#columns_to_check <- colnames(paper_44)[6:15]
#columns_to_check[1] <- "Details (at all) on enrichment analysis"

summery_score_44 <- checkPaperScore(paper_44, columns_to_check)

scores <- c(summery_score_20, summery_score_33, summery_score_44)
scores <- as.data.frame(scores[scores != 0])
colnames(scores)<- "Size"

p <- ggplot(scores, aes(x = .data$Size)) +
    geom_histogram(binwidth = 1, fill = "#0092AC") +
    theme_bw() +
  scale_x_continuous(limits = c(0, 9.5), breaks = 0:10) +
  xlab("Completeness Score") +
  ylab("Count")
  theme(panel.grid.major.x = element_blank(),
        axis.text = element_text(size = 5),
        axis.title.x = element_text(size = 7),
        axis.title.y = element_text(size = 7))
p
```





## Comparison of various Alpha paramters of Figure 37

```{r comparisonHeatmaps}
# Filter out gene sets that have fewer than 10 associated genes
filter <- sapply(genes, function(x) length(x) >= 10)
genesets_small <- genesets[!filter, ]  
genes_small <- prepareGenesetData(genesets_small) 

# Retrieve PPI information for the data 
string_db <- getStringDB(as.numeric(getId("Mus musculus")))
anno_df <- getAnnotation(string_db)
ppi <- getPPI(genes_small, string_db, anno_df)

# Calculate the Meet-Min (MM) distances
mm <- getMeetMinMatrix(genes_small)
# Set the row and column names of the MM matrix to be the names of the gene sets
rownames(mm) <- colnames(mm) <- genesets_small$Genesets

# Create a heatmap of the MM distance matrix without plot labels
d <- distanceHeatmap(mm, plot_labels = FALSE, title = "")

# Draw the heatmap and extract the row order for consistent comparison
ht <- draw(d)
row_order <- row_order(ht)

# Calculate pMM matrices for different alpha values (0, 0.5, and 1)
pMM0 <- getpMMMatrix(genes_small, ppi, alpha = 0)   
rownames(pMM0) <- colnames(pMM0) <- genesets_small$Genesets

pMM5 <- getpMMMatrix(genes_small, ppi, alpha = 0.5)
rownames(pMM5) <- colnames(pMM5) <- genesets_small$Genesets

pMM1 <- getpMMMatrix(genes_small, ppi, alpha = 1)   
rownames(pMM1) <- colnames(pMM1) <- genesets_small$Genesets

# Create and draw a heatmap for the original MM scores, keeping row order consistent
d <- distanceHeatmap(mm[row_order, row_order], 
                     plot_labels = F,
                     cluster_rows = F,
                     cluster_columns = F,
                     title = "")
draw(d)

# Create and draw a heatmap for pMM scores with alpha = 0
d <- distanceHeatmap(pMM0[row_order, row_order],
                     plot_labels = F,
                     cluster_rows = FALSE,
                     cluster_columns = FALSE,
                     title = "")
draw(d)

# Create and draw a heatmap for pMM scores with alpha = 0.5
d <- distanceHeatmap(pMM5[row_order, row_order],
                     plot_labels = F,
                     cluster_rows = FALSE,
                     cluster_columns = FALSE,
                     title = "")
draw(d)

# Create and draw a heatmap for pMM scores with alpha = 1
d <- distanceHeatmap(pMM1[row_order, row_order],
                     plot_labels = F,
                     cluster_rows = FALSE,
                     cluster_columns = FALSE,
                     title = "")
draw(d)

```

## Word Clouds of Figure 38

```{r}
# Calculate the Gene Ontology (GO) semantic distance matrix for the gene sets
goDistances <- goDistance(genesets$Genesets, species = "org.Mm.eg.db")

# Perform fuzzy clustering on the GO distance matrix.
fuzzy_clustering <- fuzzyClustering(seedFinding(goDistances, 0.5, 0.5), 0.5)

# Extract a cluster that represent T cell-related gene sets.
# The specific cluster indices were manually determined in a running `GeDi`
tCellCluster <- unique(unlist(fuzzy_clustering[c(7, 8, 14, 17, 20, 24, 28, 30, 45, 49, 50, 58, 62, 64, 65, 70, 72, 76, 81, 85, 92, 96, 103, 105, 106, 112, 114, 121, 124)]))

# Generate a word cloud visualization for the T cell-related gene sets.
enrichmentWordcloud(example_data[tCellCluster, ])

# Extract a cluster that are related to protein membrane processes.
# Again, specific cluster indices are manually selected.
proteinMembraneCluster <- unique(unlist(fuzzy_clustering[c(5, 6, 21, 23, 35, 43, 52, 54, 57, 69, 73, 80, 88, 102, 126)]))

# Generate a word cloud visualization for the protein membrane-related gene sets.
enrichmentWordcloud(example_data[proteinMembraneCluster, ])
```

## Clustering Results of Figures 39 to 41

```{r}
# Load precomputed distance matrices for the pMM and GO distance metrics
pMMDistances <- readRDS("pMM_Distances.RDS")
goDistances <- readRDS("GODistances.RDS")

# Apply Louvain clustering algorithm to the pMM distance matrix with clustering threshold 0.5
louvainpMM <- louvainClustering(pMMDistances, 0.5)

# Build a cluster graph from the Louvain clustering results for the pMM distances
g_louvainpMM <- buildClusterGraph(louvainpMM,
                                  genesets,
                                  genesets$Genesets,
                                  color_by = "Cluster")

# Visualize the Louvain clustering graph using the `visNetwork` package
visNetwork::visIgraph(g_louvainpMM) %>% 
  visNodes(color = list(border = "#545454")) %>%
  visEdges(color = list(border = "#545454"), width = 10)

# Apply Fuzzy clustering to the pMM distance matrix and
# visualisize the results afterwards
fuzzypMM <- fuzzyClustering(seedFinding(pMMDistances, 0.5, 0.5), 0.5)

g_fuzzypMM <- buildClusterGraph(fuzzypMM,
                                genesets,
                                genesets$Genesets,
                                color_by = "Cluster")

visNetwork::visIgraph(g_fuzzypMM) %>% 
  visNodes(color = list(border = "#545454")) %>%
  visEdges(color = list(border = "#545454"), width = 5)

# Apply Louvain clustering algorithm to the GO distance matrix
# Again, visualize the results afterwards
for(i in 1:100){
  louvainGO <- louvainClustering(goDistances, 0.5)
  print(paste("Round ", i, " Number Cluster ", length(louvainGO), sep = "" ))
}

louvainGO <- louvainClustering(goDistances, 0.5)

g_louvainGO <- buildClusterGraph(louvainGO,
                                 genesets,
                                 genesets$Genesets,
                                 color_by = "Cluster")

visNetwork::visIgraph(g_louvainGO) %>% 
  visNodes(color = list(border = "#545454")) %>%
  visEdges(color = list(border = "#545454"), width = 10)
```




# Session information {-}

```{r}
sessionInfo()
```


# Bibliography {-}
